-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for standalone snapshot creation #467
Conversation
@dillaman am facing one issue while mounting the cloned PVC
rbd volume Info
|
@Madhu-1 sorry for the delay, I didn't see the github notification email.
What kernel are you using? v4.16 (and later) should ignore the "operations" feature flag since it's a no-op for krbd. RHEL 8 should be based on at least v4.18 (I believe). |
@dillaman am using also, need to check what Travis CI is using |
@Madhu-1 OK, so in addition to the Ceph release restriction for PV cloning (and standalone snapshot creation as the intermediary step), we need to have a semi-recent kernel (like RHEL 8 or CentOS 7 w/ kernel-ml) |
@Madhu-1 Looks like travis would be using Xenial [1]. Bionic should be on the v4.15 kernel and that feature was backported to v4.15 [2], so it might be worth a test. [1] https://github.com/ceph/ceph-csi/blob/master/.travis.yml#L4 |
@dillaman Thanks for the hint, I have tested it locally it's working with 5.2.1 kernel centos (i will test with Travis with bionic ) |
acb5f43
to
813e0af
Compare
with the current implementation in ceph-csi, it's not possible to delete the cloned volume if the snapshot is present due to the child linking, to remove this dependency we had a discussion and come up with an idea to separate out the clone and snapshot, so that we can delete the snapshot and cloned image in any order. the steps followed to create an independent snapshot as follows * Create a temporary snapshot from the parent volume * Clone a new image from a temporary snapshot with options `--rbd-default-clone-format 2 --image-feature layering,deep-flatten` * Deletetemprary snapshot created * Create a snapshot with requested Name * Clone a new image from the snapshot with user-provided options * Check the depth of the image as the maximum number of nested volume clones can be (max 16 can be changed based on the configuration) if the depth is reached flatten the newly cloned image * Delete the cloned image (earlier we were removing the image with `rbd rm` command with the new design we will be moving the images to the trash) same applies for normal volume deletion also * Delete the temporary cloned image which was created for a snapshot * Delete the snapshot example commands:- ``` 1) rbd snap create <RBD image for src k8s volume>@<random snap name> 2) rbd clone --rbd-default-clone-format 2 --image-feature layering,deep-flatten <RBD image for src k8s volume>@<random snap <RBD image for temporary snap image> 3) rbd snap rm <RBD image for src k8s volume>@<random snap name> 4) rbd snap create <RBD image for temporary snap image>@<random snap name> <k8s snap name> 5) rbd clone --rbd-default-clone-format 2 --image-feature <k8s dst vol config> <RBD image for temporary snap image>@<random snap name> <RBD image for k8s dst vol> ``` Signed-off-by: Madhu Rajanna <[email protected]>
up for initial review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: these changes will require Mimim+ clusters. I am 100% fine w/ requiring that, but I wanted to double-check if that was the intent or if backward compatibility with creating old-style snapshots needs to be maintained?
@@ -47,7 +47,8 @@ var ( | |||
metadataStorage = flag.String("metadatastorage", "", "metadata persistence method [node|k8s_configmap]") | |||
|
|||
// rbd related flags | |||
containerized = flag.Bool("containerized", true, "whether run as containerized") | |||
containerized = flag.Bool("containerized", true, "whether run as containerized") | |||
rbdMaxCloneDepth = flag.Uint("rbdmaximumclonedepth", 16, "Maximum number of nested volume clones that are taken before a flatten occurs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: while 16 is technically the maximum, I think it should be set much lower before a flatten is required since a deeper depth will impact IO speeds. For example, Cinder uses a max depth of 5 [1] so perhaps copy that as a starting point.
Additionally, in the future it might be nice to have a soft vs hard depth limit (i.e. when you hit the soft-depth, it doesn't block PV/snapshot creation but it kicks off a background ceph rbd task add flatten
task whereas the hard limit does block until the flatten is complete). In that case, can we name this something like rbdhardmaxclonedepth
)?
[1] https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L83
@@ -88,6 +88,7 @@ spec: | |||
- "--v=5" | |||
- "--drivername=$(DRIVER_NAME)" | |||
- "--containerized=true" | |||
- "--rbdmaximumclonedepth=14" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: leave it at its default and don't override?
@@ -85,6 +85,7 @@ spec: | |||
- "--v=5" | |||
- "--drivername=rbd.csi.ceph.com" | |||
- "--containerized=true" | |||
- "--rbdmaximumclonedepth=14" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: same comment here
this will help us storing the snap info on rados and also helps in cleaning the | ||
stale snaps | ||
*/ | ||
func (cj *CSIJournal) GetTmpNamePrefix(sufix string, first, isSnap bool) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: suffix
instead of sufix
this will help us storing the snap info on rados and also helps in cleaning the | ||
stale snaps | ||
*/ | ||
func (cj *CSIJournal) GetTmpNamePrefix(sufix string, first, isSnap bool) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to split into two functions -- one for volumes and the other for snaps. Having to pass the isSnap
param just to pick an entirely different logic path seems odd to me.
return status.Error(codes.Internal, err.Error()) | ||
} | ||
|
||
volErr = restoreSnapshot(rbdVol, rbdSnap, cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we rename restoreSnapshot
to clarify that it's actually just performing an "rbd clone"? i.e. cloneSnapshot
or cloneImage
?
err = cs.doSnapshot(rbdSnap, cr) | ||
// generate temp snap struct and vol struct to create a temprory snapshot | ||
// and to create a new volume which will be used to take new snapshot | ||
tmpSnap := &rbdSnapshot{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just my opinion, but it's confusing that rbdSnapshot
and rbdVolume
are being re-used to describe internal RBD images / snapshots that don't have associated CSI snapshots and volumes. It might be nice to re-org these structures to a parent struct that is only the RBD data and a derived class that adds CSI request related fields.
"failed to delete snapshot: %s/%s with error: %v", | ||
rbdSnap.Pool, rbdSnap.RbdSnapName, err) | ||
} | ||
|
||
if found { | ||
// TODO need to delete stale volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are stale volumes?
klog.V(4).Infof("rbd: rm %s using mon %s, pool %s", image, pOpts.Monitors, pOpts.Pool) | ||
args := []string{"rm", image, "--pool", pOpts.Pool, "--id", cr.ID, "-m", pOpts.Monitors, | ||
klog.V(4).Infof("rbd: trash mv %s using mon %s, pool %s", image, pOpts.Monitors, pOpts.Pool) | ||
args := []string{"trash", "mv", image, "--pool", pOpts.Pool, "--id", cr.ID, "-m", pOpts.Monitors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be combined w/ the ceph rbd task add trash remove
call otherwise we will just be leaking images to the trash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman as you know currently we are doing rbd task add remove
to delete the image from backend
as with this implementation do I need to replace rbd task add remove
with rbd trash mv
and rbd task add trash remove
to delete image from backend?
if the user does not have the manager permission is rbd trash mv
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in a perfect world if ceph rbd task add trash remove
fails the CSI should rollback and fail the CO request. Otherwise, you are just leaking images into the trash and if folks don't know to go looking in the trash, they will just cause a hidden problem down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman Thanks for your input
will keep this in mind, so that I can replace rbd task add remove
with rbd trash mv
and rbd task add trash remove
but what we have to do if the user doesn't have permission to execute execute rbd task add trash remove
do i need to fallback to use rbd trash rm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to delete the clone and run rbd trash rm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick response @dillaman
if err != nil { | ||
return nil, err | ||
} | ||
err = updateReservedSnap(rbdSnap, cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to update the metadata? Shouldn't we know all the finalized names before we start the process?
@Madhu-1 can you revisit this PR ? |
This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏 |
Syncing latest changes from devel for ceph-csi
Describe what this PR does
with the current implementation in ceph-csi, it's not possible to
delete the cloned volume if the snapshot is present
due to the child linking, to remove this dependency
we had a discussion and come up with an idea to separate
out the clone and snapshot, so that we can delete the
snapshot and cloned image in any order.
The steps followed to create an independent snapshot as follows
Create a snapshot
--rbd-default-clone-format 2 --image-feature layering,deep-flatten
Create new PVC from a snapshot
clones can be (max 16 can be changed based on the configuration)
if the depth is reached flatten the newly cloned image
Delete cloned PVC
rbd rm
command with the new design we will be moving the images to the trash)
same applies for normal volume deletion also
Delete snapshot
example commands:-
Signed-off-by: Madhu Rajanna [email protected]
Is there anything that requires special attention
Do you have any questions?
Is the change backward compatible?
No backward compatibility with previously created snapshots
Fixes: #issue_number
Future concerns
List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.
@dillaman @ShyamsundarR @humblec PTAL